Multi-Level Cluster Indicator Decompositions of Matrices and Tensors
نویسندگان
چکیده
A main challenging problem for many machine learning and data mining applications is that the amount of data and features are very large, so that low-rank approximations of original data are often required for efficient computation. We propose new multi-level clustering based low-rank matrix approximations which are comparable and even more compact than Singular Value Decomposition (SVD). We utilize the cluster indicators of data clustering results to form the subspaces, hence our decomposition results are more interpretable. We further generalize our clustering based matrix decompositions to tensor decompositions that are useful in high-order data analysis. We also provide an upper bound for the approximation error of our tensor decomposition algorithm. In all experimental results, our methods significantly outperform traditional decomposition methods such as SVD and high-order SVD. Introduction Matrix/tensor decomposition approaches serve as both data compression and unsupervised learning techniques. They have successfully applied in broad applications in artificial intelligence/machine learning domains, including document analysis (Deerwester et al. 1990), bioinformatics (Homayouni et al. 2005), computer vision (Lathauwer, Moor, and Vandewalle 2000; Ding, Huang, and Luo 2008; Ye 2004), inferencing under uncertainty (Wood and Griffiths 2006) and approximate reasoning (Smets 2002) etc. Many other applications were reviewed by Acar and Yener (2008), and Kolda and Bader (2008). In matrix applications, Singular Value Decomposition (SVD) is the best known and most widely used one, because it provides the best low rank approximation. And in higher order tensors, multi-linear analysis approaches are developed by investigating the projection among multiple factor spaces, e.g. High-Order SVD (HOSVD) (Lathauwer, Moor, and Vandewalle 2000; Vasilescu and Terzopoulos 2002) which are popularly used in data mining areas. As a standard unsupervised learning approach, however, traditional matrix and tensor decomposition approaches often generates results which are not interpretable and further analysis processes are needed. In this paper, one of our major Copyright c © 2011, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. contributions is to propose a decomposition approach which directly outputs interpretable results using clustering based low-rank matrix/tensor approximations. This decomposition uses cluster indicators which has a nice property that they can be compacted into a single vector. We use the indicator space as the subspace bases and project the data onto the spaces. The immediate advantage of this approach is that it costs much less storage. Another advantage is that the matrix/tensor reconstruction process becomes extremely efficient, comparing with tradition decompositions. We further introduce the multi-scale versions of our method which generate much lower approximation error. Our Cluster Indicator Decomposition (CID) and MultiLevel Cluster Indicator Decomposition (MLCID) are compact (comparable and even more compact than SVD. Figure 1 shows the examples to visually compare the original images, SVD reconstruction results, and MLCID reconstruction results. The results of our method are obviously better than the results of SVD. The details of experimental set up can be found in Experiments section. We also generalize our MLCIDmethods to tensor decompositions that performs clustering on each dimension and uses clustering indicators to consist of subspaces of tensor decomposition. Another contribution in the paper is the theoretical analysis of the proposed approach, which provides a tight upper bound of the decompositions. Empirical results show that our MLCID and tensor MLCID decompositions outperform state-of-the-art methods such as SVD and HOSVD decompositions in data sets with clustering structure.
منابع مشابه
Low-rank Matrices in the Approximation of Tensors
Most successful numerical algorithms for multi-dimensional problems usually involve multi-index arrays, also called tensors, and capitalize on those tensor decompositions that reduce, one way or another, to low-rank matrices associated with the given tensors. It can be argued that the most of recent progress is due to the TT and HT decompostions [1]. The differences between the two decompositio...
متن کاملEra of Big Data Processing: A New Approach via Tensor Networks and Tensor Decompositions
Many problems in computational neuroscience, neuroinformatics, pattern/image recognition, signal processing and machine learning generate massive amounts of multidimensional data with multiple aspects and high dimensionality. Tensors (i.e., multi-way arrays) provide often a natural and compact representation for such massive multidimensional data via suitable low-rank approximations. Big data a...
متن کاملIntroduction to Tensor Decompositions and their Applications in Machine Learning
Tensors are multidimensional arrays of numerical values and therefore generalize matrices to multiple dimensions. While tensors rst emerged in the psychometrics community in the 20th century, they have since then spread to numerous other disciplines, including machine learning. Tensors and their decompositions are especially bene cial in unsupervised learning settings, but are gaining popularit...
متن کاملTensor Decompositions for Very Large Scale Problems
Modern applications such as neuroscience, text mining, and large-scale social networks generate massive amounts of data with multiple aspects and high dimensionality. Tensors (i.e., multi-way arrays) provide a natural representation for such massive data. Consequently, tensor decompositions and factorizations are emerging as novel and promising tools for exploratory analysis of multidimensional...
متن کاملOrthogonal Rank Decompositions for Tensors
The theory of orthogonal rank decompositions for matrices is well understood, but the same is not true for tensors. For tensors, even the notions of orthogonality and rank can be interpreted several diierent ways. Tensor decompositions are useful in applications such as principal component analysis for multiway data. We present two types of orthogonal rank decompositions and describe methods to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011